26 research outputs found

    Teaching data science by history: Kepler's laws of planetary motion and generalized linear models

    Full text link
    Teaching data science is challenging: it is a multidisciplinary subject that requires solid mathematical background. There are many models and approaches to consider. It is important, in our view, to present a unified approach to teaching this subject. We believe that one of the most e ective ways to do so is to present historical examples. An interesting historical example that explains Generalized Linear Models in prediction is the quest by the German astronomer, Johann Kepler, at the beginning of the 17-th century to find a unifying law explaining the motion of the planets in our Solar system.Accepted manuscrip

    DNA methylation meta-analysis confirms the division of the genome into two functional groups

    Get PDF
    Based on a meta-analysis of human genome methylation data, we tested a theoretical model in which aging is explained by the redistribution of limited resources in cells between two main tasks of the organism: its selfsustenance based on the function of the Housekeeping Gene Group (HG) and functional differentiation, provided by the (IntG) integrative gene group. A meta-analysis of methylation of 100 genes, 50 in the HG group and 50 in IntG, showed significant differences (p<0.0001) between our groups in the level of absolute methylation values of genes bodies and its promoters. We showed a reliable decrease of absolute methylation values in IntG with rising age in contrast to HG, where this level remained constant. The one-sided decrease in methylation in the IntG group is indirectly confirmed by the dispersion data analysis, which also decreased in the genes of this group. The imbalance between HG and IntG in methylation levels suggests that this IntG-shift is a side effect of the ontogenesis grownup program and the main cause of aging. The theoretical model of functional genome division also suggests the leading role of slow dividing and post mitotic cells in triggering and implementing the aging process.Published versio

    MAD (about median) vs. quantile-based alternatives for classical standard deviation, skewness, and kurtosis

    Get PDF
    In classical probability and statistics, one computes many measures of interest from mean and standard deviation. However, mean, and especially standard deviation, are overly sensitive to outliers. One way to address this sensitivity is by considering alternative metrics for deviation, skewness, and kurtosis using mean absolute deviations from the median (MAD). We show that the proposed measures can be computed in terms of the sub-means of the appropriate left and right sub-ranges. They can be interpreted in terms of average distances of values of these sub-ranges from their respective medians. We emphasize that these measures utilize only the first-order moment within each sub-range and, in addition, are invariant to translation or scaling. The obtained formulas are similar to the quantile measures of deviation, skewness, and kurtosis but involve computing sub-means as opposed to quantiles. While the classical skewness can be unbounded, both the MAD-based and quantile skewness always lies in the range [−1, 1]. In addition, while both the classical kurtosis and quantile-based kurtosis can be unbounded, the proposed MAD-based alternative for kurtosis lies in the range [0, 1]. We present a detailed comparison of MAD-based, quantile-based, and classical metrics for the six well-known theoretical distributions considered. We illustrate the practical utility of MAD-based metrics by considering the theoretical properties of the Pareto distribution with high concentrations of density in the upper tail, as might apply to the analysis of wealth and income. In summary, the proposed MAD-based alternatives provide a universal scale to compare deviation, skewness, and kurtosis across different distributions

    A clustering-based approach to automatic harmonic analysis: an exploratory study of harmony and form in Mozart’s piano sonatas

    Get PDF
    We implement a novel approach to automatic harmonic analysis using a clustering method on pitch-class vectors (chroma vectors). The advantage of this method is its lack of top-down assumptions, allowing us to objectively validate the basic music theory premise of a chord lexicon consisting of triads and seventh chords, which is presumed by most research in automatic harmonic analysis. We use the discrete Fourier transform and hierarchical clustering to analyse features of the clustering solutions and illustrate associations between the features and the distribution of clusters over sections of the sonata forms. We also analyse the transition matrix, recovering elements of harmonic function theory.Published versio

    A bi-directional adversarial explainability for decision support

    Full text link
    In this paper we present an approach to creating Bi-directional Decision Support System (DSS) as an intermediary between an expert (U) and a machine learning (ML) system for choosing an optimal solution. As a first step, such DSS analyzes the stability of expert decision and looks for critical values in data that support such a decision. If the expert’s decision and that of a machine learning system continue to be different, the DSS makes an attempt to explain such a discrepancy. We discuss a detailed description of this approach with examples. Three studies are included to illustrate some features of our approach.Accepted manuscrip
    corecore